# Multimodal visual understanding
Wr30a Deep 7B 0711 I1 GGUF
Apache-2.0
This is a quantized version of the prithivMLmods/WR30a-Deep-7B-0711 model, supporting multiple languages and suitable for various tasks such as text generation and image caption generation.
Image-to-Text
Transformers Supports Multiple Languages

W
mradermacher
262
1
Gemma 3 12b It Quantized.w8a8
An INT8 quantized version based on google/gemma-3-12b-it, supporting visual text input and text output, suitable for efficient inference deployment
Image-to-Text
Transformers

G
RedHatAI
237
1
Qwen2.5 VL 72B Instruct GGUF
Other
Qwen2.5-VL-72B-Instruct is the latest vision - language model in the Qwen family, with powerful visual understanding and video analysis capabilities, suitable for multiple fields such as finance and business.
Text-to-Image
Transformers English

Q
unsloth
3,285
1
Qwen2.5 VL 32B Instruct Exl2 4 25bpw
Apache-2.0
Qwen2.5-VL-32B-Instruct is the latest vision - language model in the Qwen family, with powerful multimodal understanding and generation capabilities, supporting the interaction of images, videos, and text.
Text-to-Image
Transformers English

Q
christopherthompson81
68
3
Amoral Gemma3 12B Vision
Vision-enhanced version based on soob3123/amoral-gemma3-12B, combining Gemma3-12B large language model with visual encoder for multimodal tasks
Image-to-Text
Transformers English

A
gghfez
25
2
Qwen2 VL 72B Instruct GGUF
Other
The GGUF quantized version of Qwen2-VL-72B-Instruct, supporting multimodal image-text to text conversion, which can be run through LlamaEdge.
Image-to-Text
Transformers English

Q
second-state
221
0
Featured Recommended AI Models